Are There New BM25 Expectations?
نویسندگان
چکیده
In this paper, we present some ideas about possible directions of a new interpretation of the Okapi BM25 ranking formula. In particular, we have focused on a full bayesian approach for deriving a smoothed formula that takes into account a-priori knowledge on the probability of terms. In fact, most of the efforts in improving the BM25 were done in capturing the language model (frequencies, length, etc.) but missed the fact that the constant equal to 0.5 used as a correction factor can be one of the parameters that can be modelled in a better way. This approach has been tested on a visual data mining tool and the initial results are encouraging.
منابع مشابه
Choosing document structure weights
Existing ranking schemes assume all term occurrences in a given document are of equal influence. Intuitively, terms occurring in some places should have a greater influence than those elsewhere. An occurrence in an abstract may be more important than an occurrence in the body text. Although this observation is not new, there remains the issue of finding good weights for each structure. Vector s...
متن کاملEffect of nutrient limitation of cyanobacteria on protease inhibitor production and fitness of Daphnia magna.
Herbivore-plant interactions have been well studied in both terrestrial and aquatic ecosystems as they are crucial for the trophic transfer of energy and matter. In nutrient-rich freshwater ecosystems, the interaction between primary producers and herbivores is to a large extent represented by Daphnia and cyanobacteria. The occurrence of cyanobacterial blooms in lakes and ponds has, at least pa...
متن کاملCan We Get A Better Retrieval Function From Machine?
The quality of an information retrieval system heavily depends on its retrieval function, which returns a similarity measurement between the query and each document in the collection. Documents are sorted according to their similarity values with the query and those with high rank are assumed to be relevant. Okapi BM25 and their variations are very popular retrieval functions and they seem to b...
متن کاملImproving the Sentiment Analysis Process of Spanish Tweets with BM25
The enormous growth of user-generated information of social networks has caused the need for new algorithms and methods for their classification. The Sentiment Analysis (SA) methods attempt to identify the polarity of a text, using among other resources, the ranking algorithms. One of the most popular ranking algorithms is the Okapi BM25 ranking, designed to rank documents according to their re...
متن کاملAdapting Document Ranking to Users' Preferences Using Click-Through Data
1* Min Zhao is currently researcher at NEC Lab China, Beijing. Abstract. This paper proposes a new approach to ranking the documents retrieved by a search engine using click-through data. The goal is to make the final ranked list of documents accurately represent users’ preferences reflected in the click-through data. Our approach combines the ranking result of a traditional IR algorithm (BM25)...
متن کامل